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Abstract — The "state-of-the-art" in Length Limited Huffman 
Coding algorithms is the 6(iVD)-time, 8(iV)-space one of 
Hirschberg and Larmore, where D < N is the length restriction 
on the code. This is a very clever, very problem specific, technique. 
In this note we show that there is a simple Dynamic-Programming 
(DP) method that solves the problem with the same time and 
space bounds. The fact that there was an Q(ND) time DP 
algorithm was previously known; it is a straightforward DP 
with the Monge property (which permits an order of magnitude 
speedup). It was not interesting, though, because it also required 
e{ND) space. 

The main result of this paper is the technique developed for 
reducing the space. It is quite simple and applicable to many 
other problems modeled by DPs with the Monge property. We 
illustrate this with examples from web-proxy design and wireless 
mobile paging. 

Index Terms — Prefix-Free Codes, Huffman Coding, Dynamic 
Programming, Web-Proxies, Wireless Paging, the Monge prop- 
erty. 



I. Introduction 

Optimal prefix-free coding, or Huffman coding, is a stan- 
dard compression technique. Given an encoding alpha- 
bet £ = {ax, . . . , oy}, a code is just a set of words in E*. 
Given n probabilities or nonnegative frequencies {pi : 1 < i < 
n}, and associated code {wi, W2, ■ ■ ■ ,w n } the cost of the code 
is jyi=iPi\ w i\ where \wi\ denotes the length of Wi. A code is 
prefix-free if no codeword Wi is a prefix of any other codeword 
Wj. An optimal prefix-free code for {pi : 1 < i < n} is a 
prefix-free code that minimizes its cost among all prefix-free 
codes. 

In |U1, Huffman gave the now classical 0(n\ogn) time 
algorithm for solving this problem. If the p^s are given in 
sorted order, Huffman's algorithm can be improved to 0(n) 
time [2|. In this note we will always assume that the p,'s are 
presorted and that pi < p2 < . ■ ■ < p n - 

In some applications, it is desirable that the length of all 
code words are bounded by a constant, i.e., \wi\ < D where 
D is given. The problem of finding the minimal cost prefix- 
free code among all codes satisfying this length constraint is 
the length-limited Huffman coding (LLHC) problem, which 
we will consider here. Fig. [T] gives an example of inputs for 
which the Huffman code is not the same as the length-limited 
Huffman code. 

The first algorithm for LLHC was due to Karp J3] in 1961; 
his algorithm is based on integer linear programming (ILP), 
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which, using standard ILP solving techniques, leads to an 
exponential time algorithm. Gilbert (4) in 1971 was interested 
in this problem because of the issue of inaccurately known 
sources; since the probabilities p^'s are not known precisely, 
a set of codes with limited length will, in some sense, be 
"safe". The algorithm presented in [4| was an enumeration one 
and therefore also runs in exponential time. In 1972 Hu and 
Tan [5 1 developed an 0(nD2 D ) time Dynamic Programming 
(DP) algorithm. The first polynomial time algorithm, running 
in 0(n 2 D) time and using 0(n 2 D) space, was presented by 
Garey in 1974 J6). Garey's algorithm was based on a DP 
formulation similar to that developed by Knuth for deriving 
optimal binary search trees in [7| and hence only works for 
binary encoding alphabets. A decade later, Larmore [8] gave 
an algorithm running in 0(n 3 / 2 D log 1//2 n) time and using 
0(n 3 / 2 D log -1 / 2 n) space. This algorithm is a hybrid of 
and [6 1, and therefore also only works for the binary case. 
This was finally improved by Larmore and Hirschberg [9| who 
gave a totally different algorithm running in 0(nD) time and 
using 0(n) space. In that paper, the authors first transform the 
length-limited Huffman coding problem to the Coin Collec- 
tor's problem, a special type of Knapsack problem, and then, 
solve the Coin Collector's problem by what they name the 
Package-Merge algorithm. Their result is a very clever special 
case algorithm developed for this specific problem. 

Theoretically, Larmore and Hirschberg's result was later 
superseded for the cas d3 D = w(log n) by two algorithms 
based on the parametric search paradigm iflOl . The algo- 
rithm by Aggarwal, Schieber and Tokuyama IfTTI runs in 
Oin^JD log n + nlogn) time and 0(n) space. A later im- 
provement by Schieber lfl2l runs in n2° W l ° s D log log n ) time 
and uses 0(n) space. These algorithms are very complicated, 
though, and even for D = uj(\ogn), the Larmore-Hirschberg 
one is the one used in practice lfL3ll . lfl4l . For completeness, 
we point out that the algorithms of |9)> ifTTIl . lfl2ll are all only 
claimed for the binary (r — 2) case but they can be extended 
to work for the non-binary (r > 2) case using observations 
similar to those we provide in Appendix [A] for the derivation 
of a DP for the generic r-ary LLHC problem. 

Shortly after [9] appeared, Larmore and Przytycka lfl5ll . 
ifTBTl . in the context of parallel programming, gave a simple 
dynamic programming formulation for the binary Huffman 
coding problem. Although their DP was for regular Huffman 
coding and not the LLHC problem, we will see that it is quite 
easy to modify their DP to model the LLHC problem. It is then 
straightforward to show that their formulation also permits 

1 f(n) = u>[g(n)) if 3N,c>0 such that Vra > N, f(n) > g(n). 
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constructing the optimal tree in Q(nD) time by constructing 
a size O(nD) DP table. This is done is Section [TT] This 
straight DP approach would not be as good as the Larmore- 
Hirschberg one, though, because, like many DP algorithms, it 
requires maintaining the entire DP table to permit backtracking 
to construct the solution, which would require Q(nD) space. 
The main result of this note is the development of a simple 
technique (section HTfl i that permits reducing the DP space 
consumption down to 0(n), thus matching the Larmore- 
Hirschberg performance with a straightforward DP model. Our 
technique is not restricted to Length-Limited coding. It can be 
used to reduce space from 0(nD) to 0(n + D) in a variety of 
0(nD) time DPs in the literature. In Section llVl we illustrate 
with examples from the D-median on a line problem (placing 
web proxies on a linear topology network) [17| and wireless 
paging [18). 

II. The Dynamic Programming Formulation 

Set So = and S m = YlTLiPi f° r 1 < W < n. Larmore 
and Przytycka [16| formulated the binary Huffman coding 
problem as a DP ([TJ where H(0) = and for < i < n: 

H(i}= /n min (H(j)+S 2i ^). (1) 

In this DP, H(n— 1) is the cost of the optimal Huffman code. 
Another version of this DP, generalized for unequal-cost binary 
coding alphabets, also appeared in lfT9l . 

It is straightforward to modify ([TJ to model the binary 
LLHC problem. The resulting DP is 

d = 0,i = 

[[{lli)= ^ v d = 0,0<*<n 

lin (H(d- l,i) + c^J d>0,0<i <n 

(2) 

where H(D, n—1) will denote the cost of the optimal length- 
limited Huffman code and 
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i = j = 

S2i-j max{0, 2i — n} < j < i (3) 
oo otherwise. 



In the next subsection we will see an interpretation of this DP 
(which also provides an interpretation of (UJ). In order to make 
this note self-contained, a complete derivation of the DP for 
the r-ary alphabet case is provided in Appendix [A] 

As far as running time is concerned, (Tj~|i appears to a- 
priori require 0(n 2 ) time to fill in its corresponding DP table. 
|[T6l used the inherent concavity of S m to reduce this time 
down to 0(n) by transforming the problem to an instance of 
the Concave Least Weight Subsequence (CLWS) problem and 
using one of the known 0{n) time algorithms, e.g., 120 1, for 
solving that problem. 

Similarly, (O appears to a-priori require Q{n 2 D) time to 
fill in its DP table. We will see that we may again use 
the concavity of S m to reduce this down by an order of 
magnitude, to 0(nD) by using the SMAWK algorithm [21] 
for finding row-minima of matrices as a subroutine. Unlike the 
CLWS algorithms, the SMAWK one is very simple to code 
and very efficient implementations are available in different 



packages, e.g., l22l . Il23l . In the conclusion to this note, after 
the application of the technique becomes understandable, we 
will explain why [16] needed to use the more complicated 
CLWS routine to solve the basic DP while we can use the 
simpler SMAWK one. 

The 0(nD) DP algorithm for solving the LLCH problem, 
while seemingly never explicitly stated in the literature, was 
known as folklore. Even though it is much simpler to imple- 
ment than the 0{nD) Larmore and Hirschberg [9] Package- 
Merge algorithm it suffers from the drawback of requiring 
Q(nD) space. The main contribution of this note is the 
observation that its space can be reduced down to 0(n + D) 
making it comparable with Package-Merge. Note that since, 
for the LLHC problem we may trivially assume D < n, this 
implies a space requirement of 0(n). Furthermore, our space 
improvement will work not only for the LLHC problem but for 
all DPs in form (fJJ where the cfj satisfy a particular property. 

A. The meaning of The DP 

We quickly sketch the meaning of the DP © for the 
binary case. Figures [TJ and [2] illustrate this sketch. We note 
that in order to stress the parts important to our analysis, 
our formalism is a bit different than [16|, JT9). A complete 
derivation of the DP for the r-ary case with the appropriate 
general versions of the lemmas and observations stated below 
along with their proofs, is provided in Appendix [A] 

It is standard that there is a 1 — 1 correspondence between 
binary prefix-free code with n words and binary tree with n 
leaves. The set of edges from an internal node to its children 
are labeled by a or 1. Each leaf corresponds to a code word, 
which is the concatenation of the characters on the root-to- 
leaf path. The cost of the code equals the weighted external 
path length of the tree. So we are really interested in finding 
a binary tree with minimum weighted external path length. 

Denote the height of the tree by h. The bottommost leaves 
are on level 0; the root on level h. Optimal assignments of the 
Pi's to the leaves always assign smaller valued p^s to leaves 
at lower levels. 

A node in a binary tree is complete if it has two children 
and a tree is complete if all of its internal nodes are complete. 
A min-cost tree must be complete, so we restrict ourselves 
to complete trees. A complete tree T of height h can be 
completely represented by a sequence (io,«i, • ■ ■ , ih), wherein 
denotes the number of internal nodes at levels < k. Note that, 
by definition, £q = 0, ih = n — 1. Also note that every level 
must contain at least one internal node so %q < i\ < ■ ■ ■ < ih- 
Finally, it is straightforward (see Appendix [A} to show that 
the total number of leaves on level < k is 2%k — ik-x> so 
2ik — ik-i < n for all k. For technical reasons, because 
we will be dealing with trees having height at most h (but 
not necessarily equal to h), we allow initial padding of the 
sequence by 0s so a sequence representing a tree will be of 
the form (io, i\, . . . , ih) that has the following properties 

Definition 1: Sequence (io, i\, . . . , ih) is valid if 

• 3t > such that io = i\ = ■ ■ ■ = if, = 0, 
. < i t +i < i t +2 < ■ ■ ■ < ih < n - 1 

• 2ik — ifc-i < n for all 1 < k < h. 
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Fig. 1. Two trees and their corresponding sequences I and codes. The left tree has sequence X\ = (0,1,3,4,5,6). The right tree has sequence X2 = 
(0, 2, 4, 5, 6). Note that, for both trees, 2i k — i^—i is the number of leaves below level k. For input frequencies (pi, . . . , P7) = (1, 1, 2, 2, 2, 4, 5, 9). The 
left tree is an optimal Huffman code while the right tree is an optimal length-limited Huffman code for D = 4. Note that we allow padding sequences with 
initial 0s, so the right tree could also be represented by sequences (0, 0, 2, 4, 5, 6), (0, 0, 0, 2, 4, 5, 6), etc.. 
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Fig. 2. Solving the DP in equation [2] for (pi , . . . , py) = (1, 1, 2, 2, 2, 4, 5, 9) with D = 4. H(d, i) is the value defined by (2)> J(d, i) is the index j for 
which the value H(d, i) in (2) is achieved. The circled entries yield the sequence (0, 2, 4, 5, 6) (the 6 comes from the fact that we are calculating -ff (4, 6)) 
which is exactly the sequence I2 from Figure [TJ The righthand tree in Figure [TJ is therefore an optimal length-limited Huffman code for D = 4. 



A sequence is complete if it is valid and ih = n — 1. 

We can rewrite the cost function for a tree in terms of its 
complete sequence. 

Lemma 1: If complete sequence (?0j*1j ■■■>ih) represents 
a tree, then the cost of the tree is X)fc=i ^2i k -i k _ 1 - 
(Note that padding complete sequences with initial 0s does 
not change the cost of the sequence.) 

We may mechanically extend this cost function to all valid 
sequences as follows. 

Definition 2: For valid X = (io, i%, . . ., i^), set 

h 

cost(X) =J2s 2tk - tk _ 1 . 

k=l 

T is optimal if costiT) = mini/ cost{X') where the minimum 
is taken over all length h sequences X 1 = (i' , i[, . . . , i' h ) with 
i' h = ih, i.e., all sequences of the same length that end with 
the same value. 



Our goal is to find optimal trees by using the DP to optimize 
over valid sequences. An immediate issue is that not all 
complete sequences represent trees, e.g., X = (0,3,4,5) is 
complete for n = 6 but, by observation, does not represent 
a tree. The saving fact is that even though not all complete 
sequences represent trees, all optimal complete sequences 
represent trees. 

Lemma 2: An optimal valid sequence ending in ih = n— 1 
always represents a tree. 

Thus, to solve the LLHC problem of finding an optimal tree 
of height < D, we only need to find an optimal valid sequence 
of length h = D ending with ijj = n — 1 (reconstructing 
the tree from the sequence can be done in 0(n) time). In 
the DP defined by equations <J2J and (O, H(d,j) clearly 
models the recurrence for finding an optimal valid sequence 
(io, ii, . . . ,id) of length d with id = j so this DP solves the 
problem. 

Note that, a-priori, filling in the DP table H(-,-) one 
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entry at a time seems to require 0(n 2 D) time. We will 
now sketch the standard way of reducing this time down 
to 0(nD). Before doing so we must distinguish between 
the value problem and the construction problem. The value 
problem would be to calculate the value of H(D,n — 1). 
The construction problem would be to construct an optimal 
valid sequence X = (1%, I 2 , ■ ■ ■ , Id) with Id = n — 1 and 
cost(T) = H(D,n — 1). This would require backtracking 
through the DP table by setting To = 0, Id = n — 1 and 
finding I\, I2, . . . Id-i such that 

V0<d<D, H{d,I d ) = H{d-lJ d - l ) + cfl Idi . (4) 

B. Solving the Value problem in 0(nD) time 

Definition 3: An nxm matrix M is MongJ^ if for < i < 

n — 1 and < j < m — 1 



for any constant c; c < 00 and c + 00 = 00. Also, 00 + 00 = 00. 
The SMAWK algorithm permits the use of 00 in this way. 

Now suppose that a DP defined by is Monge. For d = 
l,2,...,D define matrix M by 

M (d) = f H(d - l,j) + cf] if < j < i < n 
lJ 1 00 



otherwise 



Then, from (0, we have 



M 



(rf) 



- Al 



(d) 



M, 



1,3 



Mi 



(5) 



The Monge property can be thought of as a discrete version of 
concavity. It appears implicitly in many optimization problems 
for which it permits speeding up their solutions (|24|) provides 
a nice survey). One of the classic techniques used is the 
SMAWK algorithm for finding row-minima. 

Given an n x m matrix M, the minimum of row i, i = 
1, . . . , n is the entry of row i that has the smallest value; in 
case of ties, we take the rightmost entry. Thus, a solution 
of the row-minima problem is a collection of indices j(i), 
i = 1, . . . , n such that 

Mi ju) = min Mij and = max{j : Mjj = M^m}. (6) 

0<j<m 

Figure [3] gives four examples of Monge matrices and their 
row minima. 

At first glance it seems that we would have to examine all of 
the mn entries in M to find the row minima but, ED provec0 

Lemma 3: (The SMAWK algorithm lUTI ) 
Let M be a n x m Monge matrix such that entry My can 
be calculated in 0(1) time. Then the row minima problem on 
M can be solved in 0(n + m) time. 

The constant hidden by the 0( ) is very small, around 2, 
and the algorithm is easy to code, so it is quite practical to 
use. 

Note that the SMAWK algorithm doesn't have the time 
available to build the entire n x m matrix. Instead, it searches 
through the matrix in a clever way, constructing entries as 
needed. One standard use of the SMAWK algorithm is in the 
speedup of dynamic programs that have Monge properties. 

Definition 4: A DP in the form (0 is Monge if, for all 
1 < d < D and < j < i < n, 

J d ) , J d ) 



c {d) + c {d) < C W + C V 



(7) 



Note: In many DP applications, it is possible that for some 
c[ d j = 00. The inequality in @ treats 00 in the natural way, e.g., 

2 This property is sometimes alternatively defined by: for < i < %' < n 
and < j < j' < m M it j + M v ji < M v j + M t ji but it is well known, 
see, e.g., 1241 . that this is equivalent to (5j. 

technically, [21 ] proved their result for a larger class, the totally-monotone 
matrices. But all applications in the literature seem to be for Monge matrices. 



= H(d - + H(d - 1, j + 1) + c$ + c^ 1J+1 
< H{d - 1, j) + H(d - l,j + 1) + ci^. + 4 j+i 
= M ( f' . + M (d) 



and M (d) is Monge. Note that 



H(d,i) = 



min ( H id 

0<j<i \ 

(d) _ 



U) + cg>) 



= min M} • = min M, 



r(d) 



0<j<i 



h3 



0<j<N h3 



So, H(d,i) are just the row-minima of MW. See Figure [3] 
Since M ^ is Monge, we can use the SMAWK algorithm to, 
in 0(n) time, find all of its row minima at one time. More 
specifically, let J(d, i) and M^j^ ^ be the corresponding 
values (O returned when running SMAWK(MW). Then the 
algorithm for filling in the table is just to iteratively run down 
the rows of the table, using SMAWK to fill in each row by 
using knowledge of the previous row: 

Fill_Table 
For d = 1 to D - 1 
SMAWK (M^) 

VO < i < n set H{d, 1) = M^J {d i) 

Fig. 4. The O(nD) algorithm for the value problem. 

Note that this algorithm uses Q(nD) time, since, for each 
fixed d, the SMAWK algorithm only uses 0(n) time. Also 
note that if we're only interested in the final row, then the 
algorithm uses only 0(n) space, since once row d has been 
calculated, the values from row d — 1 can be thrown away. 

We now return to the LLHC problem and show that it can 
be plugged into the above machinery. 

Lemma 4: The cfj defined in (01 satisfy Monge property 
(0. 

Proof: If i — j — the righthand side of (0 is 00, so 
is satisfied. 

If j + 1 — i or 2{i + 1) — n > j, the righthand side of 
is 00, so is satisfied. 

If j + 1 < i and 2(i + 1) — n < j, (0 can be rewritten as 

&2i-j + S2(i+l)-(j+l) < D2i-(j+l) + D2(i+l)-j (8) 

It is easy to verify 

S 2 i-j + 5 2 ( i+ i)_(j + i) - S 2 i-(j+i) - S 2 (i + i)-j 

= P2i-j - P2i-j+2 < 

Hence, (0 holds. ■ 
Thus, from the discussion above, we can find all of the 
H(d, i) in Q(nD) time. In particular, H(D, n— 1) will be the 
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Fig. 3. The matrices used for calculating the DP tables in Fig. [2] The shaded entries are the row minima. The row minima for AfW are exactly the row 
entries in the H(d, i) table in Fig. [2] The column indices of the corresponding row minima are the J(d, i) entries. 



cost of the optimal tree with height at most D which is the 
required cost of the optimum D-limited code. 

We have thus seen how to solve the value problem in 
0(nD) time. The difficulty is that constructing the optimal 
tree associated with H(D,n — 1) would require finding the 
associated optimal valid sequence with if, = n — 1. This 
would require solving the construction problem by finding all 
indices Id in The standard way of solving this problem is 
to maintain an array storing the J(d, i) values returned by the 
algorithm. Starting from H(D,n— 1) and backtrack through 
the •) array, constructing the corresponding sequence by 
setting Ijj = n — 1 and Id-i = j(d,Id)- Unfortunately, 
this requires maintaining a size Q(nD) auxiliary array, which 
requires too much space. 

III. Solving the Construction Problem in 0(nD) 

TIME AND 0(n + D) SPACE 

rf\0 1 2 3 4 5 6 




Fig. 5. The dropping-level graph associated with the example from Figures 
[2] and [5] The bold edges are the minimum cost path from (0,0) to (6,4). 
Note that the i coordinates of the path are (0, 2, 4, 5, 6) which is exactly the 
sequence of J(d, i)'s corresponding to optimal solution of the problem, which 
is also the sequence corresponding to the optimal tree. 

Let V be the grid nodes (d, i) with < d < D and < 
i < n. Consider the directed graph G = (V, E) in which (d, i) 
points to all nodes immediately below it and to its right, i.e., 

E = {((d,j), (d+l,i))\ (d,j)eV,d<D,j<i} 

See Figure Such graphs are sometimes called dropping 
level-graphs ll25l . Now assign edge ( (d — l,j), (d,i)) the 
weight cfj. The length of a path in G will just be the sum of 
the weights of the edges in the path. The important observation 



is that H(d, i) in DP (O is simply the length of the min-cost 
path from (0, 0) to (d, i) in this weighted G. More specifically, 
the value problem is to find the length of a shortest path and 
the construction problem is to find an actual shortest path. 

A-priori, finding such a path seems to require 0(nD) space. 
There are two different algorithms in the literature for reducing 
the space down to 0(n + D) in related problems. 

The first was for finding a maximum common subsequence 
of two sequences. This reduced down to the problem of finding 
a max-length path in something very similar to a dropping 
level-graph in which each vertex has bounded indegree and 
bounded outdegree. Hirschberg [26 1 developed an Q(nD) 
time, 8(ri + D) space algorithm for this problem. His al- 
gorithm was very influential in the bioinformatics community 
and its technique is incorporated into many later algorithms 
e.,g |f2~71 , [28 1 . The techniques's performance is very depen- 
dent upon the bounded degree of the vertices, which is not 
true in our case. 

The second, due to Munro and Ramirez (25), was exactly 
for the problem of constructing min-cost paths in full dropping 
level-graphs. Their algorithm ran in <d(n 2 D) time and 6(n + 
D) space. Their Q(n 2 D) time is too expensive for us. We will 
now see how to reduce this down to Q(nD) using the Monge 
speedup while still maintaining the 0(n + D) space. 

The general problem will be to construct an optimal u-w 
path in G where u = (d u ,i u ) is above and not to the left 
of w — (d w ,i w ), i.e., d u < d w and i u < i w . Let G(u,w) 
be the subgrid with upper-left corner u and lower-right corner 
w (with associated induced edges from G). First note that, 
because G is a dropping level-graph, any optimal (min or max 
cost) u-w path in G must lie completely in G(u, w). Both 
algorithms (26), ||25l start from the same observation, which 
is to build the path recursively i.e., by first (a) finding a point 
v = (d, i) halfway (by link distance) on the optimal u-w path 
in G(u, w) and then (b) output the recursively constructed 
optimal u-v path in G(u, v) and optimal v-w path in G(v, w). 

For dropping level-graphs, if u = (di, i\) and w — (cfc, 12) 
then the midlevel must be d — [(di + c?2 )/2j . Suppose that 
we had an algorithm Mid(u, w) that returned a point v = 
(d, i) on a shortest u-w path in G(u, w). Then, translated into 
our notation and with appropriate termination conditions the 
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construction algorithm can be written as: 



Path(u, w) 
T. 
2. 
3. 
4. 
5. 
6. 
7. 



If u = [d, j) and w = (d + 1, i) then 

output edge (u, w) 
Else if u = (d, i) and w — (d 1 , i) then 

Output vertical path from u to w 
Else 

set v — Mid(u, w) 
Path(u, v); Path(w,w) 



Fig. 6. The algorithm for constructing a min-cost u-w path. 

(Figure [7] illustrates this idea.) To solve the original problem 
we just call Path(u ,wo) where uq = (0,0) and wo = 
(D,n — 1). Correctness follows from the fact that at each 
recursive call, the vertical distance d w — d u decreases so the 
recursion must terminate. Furthermore, when the recursion 
terminates, either (i) u — (d,j) and w = (d + l,j) so the 
only u-w path in G(u, w) is the edge (u, w) or (ii) u = (d, i) 
and w — (d' , i) so the only u-w path in G(u, w) is the vertical 
path going down from utoio. 

The efficiency of the resulting algorithm, both in time and 
space, will depend upon how efficiently v = Mid(u,w) 
can be found. Note that with the exception of the calls of 
type Mid(u,w), the rest of the execution of Path(uo, Wo) 
(including all recursive calls) only requires a total of 0(D) 
space, since each recursive call uses only 0(1) space and there 
are at most 0(D) such calls. Thus, if Mid(u, w) can be found 
using 0(n + D) space, then the entire procedure requires only 
0(n + D) space. This is actually how both (26), (25 j achieve 
their space bounds. The two algorithms differ in how they 
calculate v. Although both their approaches can be used for 
our problem, we will work with a modified version of that of 
ll25l . since it will be simpler to explain. 

We now describe how to use the SMAWK algorithm to find 
Mid(uo, wo) in 0(nD) time and 0(n) space. The extension 
to general Mid(u, w) will follow later. Recall that the proce- 
dure Fill_Table from Figure @] used the fact that H(-, ■) 
was Monge and the SMAWK algorithm to iteratively fill in 
the rows H(d, •), for d — 1,2, . . . , D. Given row H (d — 1, •), 
the procedure calculated H(d,-) in 0(n) time using SMAWK, 
and then threw away H(d — 1, •). 

Consider an arbitrary node (d, i) on level d > d. The 
shortest path from uq to (d, i) must pass through some node 
on level d. We now modify Fill_Table to "remember" this 
node. More specifically, our algorithm will calculate auxiliary 
data pred(d, i). 

• For d < d, pred(d, i) will be undefined. 

• For d > d, pred(d, i) will be an index j such that node 
(d,j) appears on some shortest path from uo to (d, i). 

So, when the procedure terminates, v = (d, pred(d, n— 1)) 
will be Mid(u ,wo)- 

By definition, on level d, we have pred(d,i) = i. 



For d > d suppose (d— 1, j') is the immediate predecessor 
of (d, i) on the shortest path from uo to (d,i). Then (i) a 
shortest path from uo to (d — 1, j') followed by (ii) the edge 
from (d — l,j') to (d,i) is (iii) a shortest path from uq to 
(d,i); we may therefore set pred(d,i) = pred(d — l,j')- 

We can use this observation to modify Fill_Table to 
calculate the pred(d, ■) information. 

Mid(uo, wq) 
For d = 1 to d 
SMAWK (M^) 

V0<z<7iset H{d_,i) = M$ (di) 
VO < i < n set pred(d, i) = i; 
For d = d+ 1 to D 

SMAWK (M^) 

VO < i < n, set H(d, i) = Af> J(i/ ; , 
VO < i < n, set pred(d, i) = pred (d — l,j(d, i)) 



rid) 



Fig. 



Returns the midpoint, by link distance, on min-cost uq-wq path. 



Note that Mid(uo,wo) can throw away all of the values 
pred(d — 1, •) and H(d — 1, •) after the values pred(d, ■) and 
H(d,-) have been calculated, so it only uses 0(n) space. 
Similarly to the analysis of Fill_Table, it uses only 0(nD) 
time since each call to the SMAWK algorithm uses only 0(n) 
time. 

So far, we have only shown how to find v = Mid(uo, Wq). 
Note that the only assumptions we used were that H(-, ■) 
satisfies DP (O and is Monge, i.e., the cfj satisfy (Q. 

Now suppose that we are given 

u = (d u ,i u ), w = [d Wl i w ) with d u < d w and i u < i w . 

G(u, w) is a dropping level-graph on its own nodes so the 
cost of the shortest path from u to any node (d u + d,i u + i) E 
G(u, w) is H (d, i) defined by 



H(d,i) = 





oo 
min 

0<j< 



( (H(d- 



7.W 

t-,3 



where N = L„ — L 



if d = 0, i = 

if d = 0, < i < N 

if d > 0, < N 

(9) 

,-. Note that 



1 and eg = cf] +iii , r 
this new DP is exactly in the same form as (0, just with a 

i t r» it nnmnul 



different n and shifted cf^ . Since the original c\ ■ satisfy (Q, 
so do the cf) . Thus (O with the c,-' 



cfj . Thus (O with the c\ d J is Monge as well. 
Therefore, we can run exactly the same algorithm written 
in Figure |8] to find the midpoint v = (d,i) = Mid(u,w), of 
the min-cost u-w path in 0((d w — d u )N) time and 0((d w — 
d u ) + N) = 0(D + n) space. 

As discussed previously, if Mid(u,w) only requires 0(n + 
D) space, then Path (it, w) only requires 0(n + D) space, so 
we have completed the space analysis. 
It remains to analyze running time. Set 

Area(u, w) = (N — l)(d w — d u ) 

to be the "area" of G(u,w). Recall that line 3 of 
Path(ii, w)implies that d u ^ d w when Mid(u,w) is called. 
Therefore N > 1 and the running time of Mid(u, w) is 

0((d w - d u )N) = 0(Area(u,w)). 
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Fig. 7. An illustration for finding the optimal path. Here, D = 8 and there are 3 levels of recursions. The solid circles are the intermediate nodes found by 
the Mid(u, v) procedures. The first level of recursion finds the midpoint on level 4; the second level, the midpoints on levels 2 and 6; the third the midpoints 
on levels 1, 3, 5, 7. At that point all subproblems are of height one and easily solvable. Note that each recursive call splits a problem on a box of height 2* 
into two problems on disjoint boxes of height 2 I— 1 . 



We now analyze the running time of Path(uo, wq). First 
consider the recursive calls when lines 1-4 occur, i.e., the 
recursion terminates. The total work performed by such calls is 
the total number of edges outputted. Since an edge is outputted 
only once and the total path contains D edges, the total work 
performed is 0(D). 

Next consider the calls when line 5-7 occur. Since each 
such call returns a vertex v on the path, there are only D — 1 
such calls so lines 6 and 7 are only called 0(D) times and 
their total work, with the exception of the call to Mid(u, v), 
is O(D). 

Finally consider the work performed by the M id(u, w) 
calls. Partition the calls into levels. 

« Level 1 is the original call Mid(uo,Wo). 

• Level 2 contains the recursive calls directly made by the 
level- 1 call. 

• In general, level i contains the recursive calls directly 
made by the level- (i — 1) calls. 

Note that if Mid(u, w) is a level i call with u — (d u . i u ) and 
to = (d w ,i w )) then 

D , , D 

-<d w ~d u <- + \. 

Furthermore, by induction, if Mid(u, w) and Mid(u', w') are 
two different level i calls, then horizontal ranges [d u , d w ] and 
[d u i,d w i] are disjoint except for possibly d w = d u > or d u = 
d w ' . 

Fix i. Let (v,j, Wj) j = 1, . . . t be the calls at level i. The 
facts that each grid G(uj,Wj) has height < ^ + 1 and that 
the horizontal ranges of the grids are disjoint implies 

3=1 V J 

Thus the total of all level-z calls is O (n + l)). Sum- 
ming over the [logD] levels we get that the total work 
performed by all of the Mid(u, w) calls on line 6 is 




= 0(nD). 



Thus, the total work performed by Path(u 0j ' !W o) is 0(nD) 
and we are finished. 



IV. Further Applications 

We just saw how, in Q(nD) time and Q(n + D) space, 
to solve the construction problem for any DP in form (|2]i that 
satisfies the Monge property (|7). Q(nD) time was known pre- 
viously; the 0(n + D) space bound, is the new improvement. 
There are many other DP problems besides the binary LLHC 
that satisfy (|7]i and whose space can thus be improved. We 
illustrate with three examples. 

The r-ary LLHC problem: 

We have discussed the binary LLHC problem in which |S| = 
2. The general r-ary alphabet case with A*" probabilities is still 
modeled by a DP in form (O but with n = -^5f + !■ The only 
difference is that (O is replaced by 

(d) _ f Sri-j if max{0,rz - N} < j < i 
lJ 1 oo otherwise. 

A full derivation of this DP is given in Appendix lAl The proof 
that the cfj satisfy the Monge property is similar to the 
proof of Lemma [4] Thus, we can construct a solution to the 
r-ary LLHC problem in Q(ND) time and Q(N) space as 
well. 

D medians on a line: 

We are given n— 1 customers located on the positive real line; 
customer i is at location Uj. Without loss of generality, assume 
i>i < i>2 < ■■■ < v n -\. There are D < n service centers 
located on the line and a customer is serviced by the closest 
service center to its left (thus we always assume a service 
center at vq — 0). Each customer has a service request w; > 
0. The cost of servicing customer i is Wi times the distance 
to its service center. In |[T7l . motivated by the application of 
optimally placing web proxies on a linear topology network, 
Woeginger showed that this problem could be modeled by a 
DP in form in form © where 

i 

C ij = H wi(vi-v j+1 ) 

l=j+l 

and proved that these satisfy Monge property (0. He then 
used the SMAWK algorithm to construct a solution in 0(nD) 
time and 0(nD) space. Using the technique we just described, 
this can be reduced to 0(nD) time and 0(n) space. 

We also mention that there is an undirected variant of this 
problem in which a node is serviced by its closest service 



s 



center looking both left and right. There are many algorithms 
in the literature that (explicitly or implicitly) use concavity 
to construct solutions for this problem in 0(nD) time using 
O(n) space, e.g., |29), ED, ED- El does this by using a 
DP formulation that is in the DP form (O and satisfies the 
Monge property (0 so the technique in this paper can reduce 
the space for this problem down to 0(n) as well. 

Wireless Paging: 



The third application comes from wireless mobile paging. 
A user can be in one of N different cells. We are given a 
probability distribution in which pi denotes the probability that 
a user will be in cell i and want to minimize the bandwidth 
needed to send paging requests to identify the cell in which 
the user resides. This problem was originally conjectured to be 
NP-complete, but |32| developed a DP algorithm for it. The 
input of the problem is the n probabilities p\ > p2 > ■ ■ ■ > p n 
and an integer D < n (corresponding to the number of paging 
rounds used). The DP developed by 11321 is exactly in our DP 
form (fj) with 



if d - 1 < j < i 
otherwise. 



(11) 



The goal is to compute H(D, n), which will be the minimum 
expected bandwidth needed. Solving the construction version 
of this DP permits constructing the actual paging protocol that 
yields this minimum bandwidth. 

[32] used the naive algorithm to solve the DP in Q{n 2 D) 
time and Q(nD) space. Il33l proved that the cfj defined by 
(fTTT i satisfy the the Monge property (|7]i and thus reduced the 
time to 0(nD), but still required Q(nD) space. The algorithm 
in this paper permits improving the space complexity of 
constructing the protocol down to 0(n). 

V. Conclusion 

The standard approach to solving the Length-Limited Huff- 
man Coding (LLHC) problem is via the special purpose 
Package-Merge algorithm of Hirschberg and Larmore J9) 
which runs in 0{nD) time and 0(n) space, where n is the 
number of codewords and D is the length-limit on the code. 

In this note we point out that this problem can be solved 
in the same time and space using a straightforward Dynamic 
Programming formulation. We started by noting that it was 
known that the LLHC problem could be modeled using a DP 
in the form 

if d = 0, i = 

oo ifd = 0, 0<i<n 

(H(d - l,i) + 4 d ^ if d > 0, < i < n 

(12) 



H(d,i) = 



min 

0<j<i 



where H (d, n) will denote the minimum cost of a code with 
longest word at most d and the c\j are easily calculable con- 
stants. This implies an 0{n 2 D) time 0(nD) space algorithm. 
We then note that, using standard DP speedup techniques, e.g., 
the SMAWK algorithm, the time could be reduced down to 
0(nD). The main contribution of this paper is to note that, 
once the problem is expressed in this formulation, the space 
can be reduced down to 0{n) while maintaining the time at 



0(nD), The space reduction developed for this problem was 
also shown to apply to other problems in the literature that 
previously had been thought to require Q(nD) space. 

We conclude by noting that if we're only interested in 
solving the standard Huffman coding problem and not the 
LLHC one then DP O with cf) defined by ([TOj collapses 



down to 



H{i)= min H{j)+S r 

max{(J,n- N }S.?<* 



(13) 



where H(i) denotes the minimum cost of a "valid sequence" 
ending in i. H ^-^3f) will be the cost of an optimal complete 
sequence and solving the construction problem for this DP will 
give this optimal sequence. We can construct the code from 
this optimal sequence in O(N) time. 

There is a subtle point here which should be mentioned. 
The matrix M defined by 



H(j) + S r 



if max{0, ri 
otherwise 



N} < j < i 



is Monge (the proof is similar to that of Lemma |4j. We 
can not use the SMAWK algorithm to find its row minima 
and solve the problem, though. The reason is that, as stated 
in Lemma [3] the SMAWK algorithm requires being able to 
calculate any arbitrary requested entry My in 0(1) time. In 
our current DP, though, the My are dependent upon the values 
H(j) which are the row-minima of other rows in the same 
matrix! Thus, we have no way of calculating My in 0(1) 
time when required and the SMAWK algorithm can not be 
applied. This is the reason why Larmore and Przytycka lfl6l 
needed to use the more sophisticated CLWS algorithm of l20l 
to solve the binary (r — 2) version of this problem. Other 
algorithms for more generalized versions of the CLWS have 
since appeared, e.g., (34), that could also be used to solve this 
problem in 0(n) time, but they are also quite complicated. 
To summarize, by transforming r-ary Huffman coding into 
a DP and using sophisticated tools such as 1201 or 11341 we 
can solve the problem in 0(n) time. This is not of practical 
interest, though, since the simple, greedy, Huffman encoding 
algorithm is just as fast. Where the DP formulation helps is in 
the LLHC problem, exactly where the greedy procedure fails. 
In that case we have the added practical benefit of being able 
to use the simple SMAWK algorithm rather than the more 
complicated 1201 or If34l . 
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Appendix A 
Derivation of the LLHC Dynamic Program 

In order to make this note self-contained we provide a brief 
derivation of the DP that models the LLHC. To the best of our 
knowledge, the derivation for the general r-ary case has never 
been written down before (although it is known as "folklore")- 

A set of n prefix-free codes in an r-ary alphabet can be 
represented by an r-ary tree with n leaves. The z^ 1 edge from 
an internal node to its children is labeled by a.- L . Each leaf 
corresponds to a code word, which is the concatenation of the 
characters on the root-to-leaf path. Then, the expected code 
length equals the weighted external path length of the tree. 

Denote the height of the tree by h. The lowest leaves are 
on level 0; the root is at level h. Optimal (min weighted 
external path-length) assignments of the probability p;'s to the 
leaves always assign smaller probabilities to leaves at lower 
levels. Since the probabilities are given in sorted order, this 
assignment can be done in 0(n) time for a given tree. The 
cost of a tree is its weighted external path length w.r.t. an 
optimal assignment. 

Define the degree of a node to be the number of its children. 
A node is complete if it is of degree r, and a tree is complete 
if all its internal nodes are complete. The following properties 
are easy to prove 

Property 1: In an optimal tree, the internal nodes at levels 
> 2 are complete. 

Property 2: There is an optimal tree that has at most one 
incomplete internal node, and if this node exists, it is at level 
1. Furthermore, the degree of this incomplete node is > 2. 

These properties imply that the optimal tree is almost 
complete and has T^Erl internal nodes. If n — 1 is divisible 
by r — 1, the tree is complete. Otherwise, we can add 



1 



1 



r - 1 



(r - 1) < r - 2 



dummy leaves to make it complete. We assign dummy p^s 
with zero values to these dummy leaves. It is easy to see that 
the new tree with these dummy leaves is precisely an optimal 
tree for the probabilities with the added zero-valued dummy 
Pi's. So, finding an optimal tree for probabilities with these 
dummy p^'s is equivalent to the original problem. Therefore, 
w.l.o.g., we assume in the original problem, the optimal tree 
is a complete tree, i.e., we assume n — 1 is always a multiple 
of r — 1. In this way we transform the r-ary Huffman coding 
problem to the problem of finding an optimal complete r-ary 
tree with n leaves. 

A complete tree of height h can be fully represented by 
a sequence (io, i\, ■ ■ ■ , ih), where ik denotes the number of 
internal nodes at levels < k. Note that from this sequence we 
can calculate 1^ — — ik-i, the number of internal nodes on 
level k and with that information we can reconstruct the tree 
in 0(n) time as follows: 
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Create 

1. For k = 1 to h 

2. Create Ik nodes Vk = . . . ,Vi k } on level fc; 

3. Create rife — Ik-i leaves on level fc — 1; 

4. Make {v\, . . . ,vi k } the parents of the rife nodes on 
level k — 1. 

We will now see how to rewrite the cost of a tree using its 
representative sequence: 

Lemma 5: If T = (io,ii, • • • , ih) represents tree T, then T 
has rik — ife-i leaves on levels < fc. 

Proof: Consider the forest which is the portion of T on 
or below level fc. It is composed of Ik = ik — ife-i trees with 
roots on level fc, 

In total, the forest contains ik internal nodes. 

If T" is a complete r-ary tree with m internal nodes then 
T' has (r — l)m + 1 leaves so our forest must contain (r — 
l)ifc + 4 = rife - ik-i leaves. ■ 

Recall that S m — YliLiPi f° r 1 — m — n - Using the 
lemma above, we have 

Lemma 6: If the sequence (io, i\, . . . , ih) represents a tree, 
then the cost of the tree is J\=i Sn h -i k -i- 

Proof: Recall from Lemma[5]that rife— ik-i is the number 
of leaves at levels < fc. So 

Cost of the tree 
= X^=o ( sum °f weights of leaves at level t) ■ (h — £) 
= X^=o ( sum °f weights of leaves at level I) ■ J2k=i+i 1 

— J2k=i Sfco ( sum °f weights of leaves at level £) 

— Yjk=i ( sum °f me weights of leaves at levels < fc) 

h 

= ^ ] S r i k -i k _ 1 

k=l 

■ 

For a complete r-ary tree with n leaves, we have = iq < 
i\ < ■■■ < ih = an d, from Lemma [5] rik — ifc-i < ft 
for all 1 < fc < /i. r 

For technical reasons, because we will be dealing with trees 
having height at most (but not necessarily equal to) h, we 
allow initial padding of the sequence by 0s so that a sequence 
representing a tree will be of the form (io, ii, . . . , ih) that h as 
the following properties 

Definition 5: A sequence (io, i%, . . . , ih) is a valid (n, re- 
sequence, if 

• 3t such that io = i\ = ■ ■ ■ = U = 0. 
. < i t < ■ ■ ■ < i h < tEt 

• rik — ik-i < n f° r all 1 < fc < /i. 

A sequence is complete if it is valid and ih = 1 -^Ej- 
It is straightforward to see that padding the sequence repre- 
senting a tree with initial 0s, does not change the tree built by 
the Create procedure or the validity of Lemmas [5] and [6] 

We can now extend our cost function to all valid (n,r)- 
sequences sequences, not just the ones representing trees. 

Definition 6: For valid (n, resequence X — (io, ii, . . . , ih) 
define 

h 

cost(l) = J2 S " k -i k -i- 
fc=l 



X is optimal if cost(I) = minx/ cost(X') where the 
minimum is taken over all valid length h (n, resequences 
X' = {i'o,i'i, ■ ■ ■ ,i'h) with i' h = ih, i.e., all sequences of the 
same length that end with the same value. 

Note: padding a sequence with initial 0.s doesn 't change its complete- 
ness or cost. Furthermore, if X is created by padding the sequence 
corresponding to tree T with initial 0s, then procedure Create will 
still recreate T from X. 

It follows from the definitions that for fixed (n, r) we 
can calculate H(d,j), the cost of an optimal (n, resequence 
(0, ii, 12, ■ ■ ■ , id) with id = j using the DP (|2]i with 

r if i=j=0 

c\ d ) = } S ri -j if max{0, ri - n} < j < i (14) 
[ oo otherwise. 

The subtle issue is that not all complete sequences corre- 
spond to trees, e.g, (0,3,4,5) is a complete (6,2) sequence 
that does not represent any binary tree. Thus, a-priori, finding 
an optimal complete sequence might not help us find an 
optimal tree. We are saved by the next lemma. 

Lemma 7: An optimal complete (n, resequence always 
represents a tree. 

Thus, we can find an optimal tree by first solving the con- 
struction problem for DP (O with conditions ( fT~4T > to get an 
optimal complete (n, r)-sequence X and then building the tree 
that corresponds to X. 

Before proving Lemma [7] we will need to extend our 
definitions from trees to forests. See Figure [9| a). 

Definition 7: A legal (n, r)-forest, or forest, is a collection 
of complete r-ary trees that together contain at most n leaves, 
all of whose roots are at the same height. 
Given p\ < p2 < . . . , p n we can assign the pi to the leaves 
of forest F from bottom to top of tree and define the cost of 
F (with respect to the pi) to be the sum of the costs of its 
component trees. Note that a tree with n leaves is a forest and 
its cost as a forest will be the same as its cost as a tree. 

Now, for forest F let ik be the number of internal nodes 
it has at level < fc. Then, we can talk about the sequence 
X = (io, ii, . . . , ih) associated with the forest. Reviewing the 
proofs of Lemmas [5] and [6] we see that they were actually 
statements about forests and not trees so F has rik — ife-i 
leaves on levels < fc and cost(F) — cost(X). 

We will prove 

Lemma 8: An optimal (n, r)-sequence X = (io, ii, . . . , ih) 
always represents a forest. 

Note that this will immediately imply Lemma [7] because if 
X is complete then ih = ^Ej an d, by validity, ri h — ih-\ < n, 
implying ih-i = ih — 1- Thus the forest corresponding to X 
is composed of exactly ih — ih-l = 1 trees at level h and is 
therefore a tree itself. 

Proof: (of Lemma [8) 
Without loss of generality assume that io = Q < i\. Our proof 
will be by induction on h. 

First note that if h = 1, then X — (0, i\) for some i\ > 
and this represents the forest composed of i\ complete trees 
each of height 1 so the lemma is trivially correct. 

Now let h> 1. Set I h = i h -i h -i and I h -i = I h -2-h-i- 
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Fig. 9. Illustration of the two cases in the proof of Lemma[8] Here, r = 2 and h = 4. (a) is the forest F' corresponding to the old sequence 1' = (0, 1, 3, 7). 
(b) illustrates case 1: if i/, = 10 then 1^=3 and 21j, = 6 > 4 = I^—i so we can create a forest corresponding to the new sequence (0, 1, 3, 7, 10). (c) 
illustrates case 2: if = 8 then 1^ = 1 and 21^ = 2 < 4 = Ih—i- In this case the sequence Z = (0, 1, 3, 5, 8) (corresponding to the forest pictured) has 
cost S2 + S5 + SV + S11. This is cheaper than the cost 52 + S5 + Sn + Sg of the sequence I = (0, 1, 3, 7, 8). As noted in the proof, I is constructed 
by lifting two subtrees in the forest in (a) and then writing down the corresponding sequence. 



Define X' = (iq, ii, . . . , ih-i)- Since I' is optimal, by 
induction, X' represents a forest F' with Ih-i roots at level 
h — 1 and a total of Lh-i = rih-i — ih-i leaves. There are 
now two cases: see Figure [9] 
Case 1: rlh > Ih-i- 

Then X represents a forest with Ih roots whose rlh children 
are exactly the Ih-i roots from F' and another rlh — Ih-i > 
leaves. So the Lemma is correct. 
Case 2: rift < ifi-i: 

We will show that this contradicts the optimality of X and is 
therefore impossible. Thus Case 1 will be the only possible 
case and the Lemma correct. 

Assume now that rlh < Ih-i an d set s = Ih — I — rlh > 0. 
This can be rewritten as r{ih — ih-i) + s = r {i>h-i — 2) 
so 



Since every such node was raised one level, 



h-l 

E 

m— 1 



< 




Combining (b) and (c) shows that cost (I) < cost(X). This 
is a contradiction since both X and X are valid sequences of 
length h that end with the same value ih and X is optimal. 
Thus the case rlh < Ih-i can not happen and we are finished. 



rih - ih-i = rih-i - ih-2 



■5 = Lh- 



Now consider F as being labeled with the Lh-i smallest 
Pi and construct a new forest F as follows. Choose s trees 
from F containing the s largest weights in the forest, i.e., pj, 
j = Lh-i, Lh-i — 1, ... , Lh-i — (s— 1). Move those s forests 
up one level so their roots are now at height h and not h — 1 . 
Now add Ih new nodes to level h. Make them the parents of 
the remaining rlh nodes on level h—1. This forest is a legal 
forest. Call its representative sequence X = (io, h, ■ ■ ■ , ih)- 

We now observe 

(a) ih-i = ih-i - s so 

ih = h-i + s + h = h-i + h + s = ih- 

(b) Thus ri h - ih-i = rih - (ih-i - s) = L h -i and 

Lh—i 

Sri h —i h -i = ^ L h-i = £>ri h -i h _i + Pj 

(c) Let F' be levels 0-(/i — 1) of F. Since every complete tree 
contains at least r nodes, the s trees raised contain at least the 
s nodes pj where Lh-i — s < j < Lh-i and one other node. 



